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ABSTRACT * 

- The Design of Information Systems in the Social 
Sciences fDISlsS) is a research project with the objective of 
carrying out research, necessary for the effective design of 
information systems in the social sciences, whether by the creation 
of hew systems or the modification of existing systems. This working 
paper explores the methods available for the groining and linking of 
joornal articles by citations, and the use of these methods as a 
means of information retri<§val. The studly is not only speculative but 
eaqplofatory, as the data retported was derived from pilot studies 
conducted on a very small ^ale. .lf the concepts expressed here are 
considered valid and useful, large-scale tests need to be carried 
out, with machine^readabie files.. Such tests will not be possible 
during the DISISS project, but they could well form part of further 
researdh. .Not a?J. of the uses of citation data concern retrieval, and 
not all retrieval by mesans of citation datai involves networks. .This 
paper mentions some of these other uses for the sake of completeness. 
(Related reports are LI004401 and 004402.) (Author/SJ) 
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PREFACE 

DISISS (Design of InFormotion Systems in the Social Sciences) is a 
resecirct^ project based at the University of Bath. The objective of thie project 
IS to carry but research necessary for the effective design of infonnaticn systems in 
the .social sciences, whether by the creation of new systems or the modification of 
existing systems. The project, which is funded by OSTI, commenced in 
January 1971. 

Work on other parts of the project is being reported in a series of working 
papers which are listed in Appendix C. These, together with an Ou line of work 
carried out in 1971 and 1972, can be obtained from the Library, University of Both, 
Claverton Down, Both, BA2 7AY. 

The present working paper explores the metjbods available for the grouping 
and linking of journal drticUs by citations, and the use of these methods OS a means 
of information retrieval; The study is not only speculative but exploratory, as the 
data reported was derived from pilot studies conducted on a very small scale. If 
the concepts expressed here are considered valid and useful, large-scale tests need to 
be carried out, with mochine-^readoble files. Such tests will not be possible during 

the DISISS project, but they could well fonm part of further research. 

*' ^ - ~ 

Because, so far as we know, the approach put forward here is new, comments 
on this paper would be especially welcome. Although the data used relates to the 
social sciences the method is of course applicable in any discipline. 

Not all of the uses of citation data concern retrieval, and not all retrieval 
1^ means of citation data involves networks. This paper mentions some.of these 
other uses for the sake of completeness. 

The outline of this paper was prepared by Michael Brittain, but 
Barbara Skelton was largely responsible for the drafting. Drafts were read, and 
mimeroiis contributions made, hy JMmcB Line, Stephen Roberts and Robert Bradshaw. 



1.0 INTRODUCTION 



This working paper explores the potential uses of citation analyses for 
the retrieval of infonration and in particular the use of citation networks to identify 
groups of {ournal articles. ^ 

Citation data has been analysed since the 1920s as a guide in the 
selection of {ournals for library collections, and to a lesser extent in determining the 
coverage of secondary services. These uses will probably increase os-optimisation 
of information systems becomes more necessary . Users of infonnation,however, are 
generally concerned much more with' retrieving relevant journal articles and are not 
direetly concerned with the selection policies, whether of I ibraries or secondary services. 

In its simplest forni, a citation analysis involves the collection and counting 
of citations in a limited number of journals, (referred to as 'source items'); this results 
irra list of cited journal titles, or authors, according to their frequency of citation. 
Lists of cited journal articles have rarely been produced, because the small samples of 
source journals typically used yield very few articles that ore cited frequently. The 
relatloHship between source items and cited items is not usually explored in studies 
of this kind. 

By analysing the relationship between source items and cited items, it is 
possible to derive a structure of linkages, referred to in this paper as a citation 
network. A citation network reveals the inter-relationships of articles; the frequency 
of citation of each article is shown and also the actual source articles that moke the 
citations. 

Conciderable work has been dorie on the nature, value and use of citations. 
A nscent review of the literature by Hdll (1970) indicates that the majority of the work 
has been concerned with citation counts^ A well-known work in this area is that of 
Brown (1956),. who reported lists of frequently^^^^^^^ In mathematics, physics. 
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chemistry, geology, physiology, botany, zoology, and entomology* These lists 
have from time to time been used as a guide to the selection of journals by libraries. 
Basically, the same principles are involved in using a frequency list of monographs, 
although monographs have received much less attention, largely because the frequency 
with which most of them are cited is relatively low. Recently more refined measures 
of {ournol citations have been attempted. For example, the number of citations 
received by o journal can be expressed as a ratio of the overage number of 
articles it publishes. each year (Garfield 1971); Garfield has called this the Mmpact 
factor^* A further recent use of citation data has been the application of clustering 
techniques (Price and Schiminovich 1968). An important part of DISISS research 
is the clustering of |Ournai titles to produce groups of titles related by their citations. 
This work will be reported in a further working paper. 

Little work has been carried out/with journal articles (as opposed to journal 
titles). Kessler (1963) analysed the relationship between articles by grouping together 
articles with citations in commorv he . calls this 'bibliographical coupling'. Garfield 
(1963 and 1970), Garfield, Sher and Torpit (1964) and Price (1965) have used 
citation data to trace the structure of knowledge and the flow of inFonnation. All of 
these studies hove necessitated the construction of citation networks in one form or 
another, biit they have been mainly concerned with the interpretation oF the network 
rather than with the method of its construction. 

There hove been a few evaluative studies on the use of citation indexes for 
information retrieval. The best known citotioh index is Science Citation Index 
(SCi; ^ which has been used as a dota bose for the construction of networks described 
in this paper. Woldhort (1964) compared the compilation of a bibliography on 
lasers from SCI and from five conventional indexing and abstracting services, 
coniel udi ng that SCI p roduced more unique references than the other services. 
Spencer (1967) conducted a similar test whilst compiling a bibliography on the drug 
thalidomide; again, SCI produced more unique referertces. A4artyn (1965) conducted 
o small test on SCI' tracing the subject of gallium as a semi-conductor; he found that 
for every two rejevani articles he retrieved five irrelevant articles. The 'noise' in 
the system Is th^refore 'quite high, but Mortyn concluded that if he had searched 
using drtfde title^^ rother than all the citations from each article/ the number of 
tn^ieydiit fiiferences might have been wieirrediiced. These studies famine 
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speclficQlly the use of citation indexes ^ not the use of citation networks^ with 
which this paper is mainly concerned; but evaluative .studies on the use of networks 
cannot of course be carried out until networks have been cov6tructed and tested. 

The main objective of this paper is to examine the method and procedures for * 
the construction of citation networks in order to identify groups of articles. Several 
pilot studies were undertaken to exomine first the method for construction of citation 
networks,, and seconclly to evaluate the effectiveness of generating groups of articles* 
The networks in these studies were compiled by hand. If articles are to be retrieved 
on a large scale, the methods miist be suitable for machine handling. This working 
paper does not attempt to cover the details of the file structure for machine handling, 
although some implications for machine requirements are mentioned. The methods 
and procedures described in ihh paper are based on 'hand*drown* networks; once 
the methods have been explored, jdfgte^^cale tests can then be undertaken. However, 
these tests cannot be carried but during the present DiSISS project. 

The working paper is arranged in the following manner. Section 2 considers 
the "analysis of data for citation networks and describes methods for construction. 
Three methods have been developedl; they are described in this paper os CITED search, 
SOURCE search and DUAL search. Section 3 discusses briefly some applicatior^ of 
citation networks for Infonnation services* The networking technique allows great 
flexibility for the retrieval of informotion^and In particular DISISS has'considered the 
packaging of infomotlon b«ed on citation networks. The pilot studies are reported 
in section 4. Two groups of studies were conducted; {) those to identify groups of 
articles within a subject, and ii) those to identify groups of articles that deal with new 
concepts and trends. The latter was felt to be important^for any information system in 
the social sciences must deal adequotely with the 'soft* tenfninology»Finolly,in 
sections, brief details are given of further work DISISS hopes to carry out on citation 
networks. 



2.0 CITATION NETWORKS 

2.} Wcys of analysing citation data to show relationships* and to develop networks • 

There ore varbus methods for the analysis of citation data which are of 
potential value for library and information system design* As already mentioned, the 
simplest method, and thot most commonly used, consists of counting citations from 
given source journals, and constructing a frequency list of cited journals* Such a 
I ist moy be used as a guide for library holdings* In this type of analysis, the 
relationship betwesn source and cited journol h ignored once the frequency list has 
been compiled* The analysis may be described as a one«-step process* Figure 1 
illustrat«$ in diagrammatic form the stages of the analysis* A, B and C are selected 
source journals and the cited journals taken from these are represented by D to K* 
Source journal A cites D, E, F, J ond iC * Source journol B cites F, G, H and 1* 
Source journal C cites I, J and K* Jonrnfifs D to K can be ordered into a ranked list 
according to c itot ion frequency * 

An off-shoot of this analysis has been explored by^ l^ssler (1963)* He has 
investigoted the relationship of the number of citations in common between articles* 
He has called the relationship *bibliogra)3^ical coupling* and has related the strength 
of the relationship to the number of citations in common* The source articles related 
in this manner are assumed to hove a similanty of content* In figure 1, A to K can 
represent articles not jdurnals* Article A cites article J and K which are also cited 
by article C* Article F is cited by both articles A and B and article I by articles B 
and C* There is therefore a relationship between all three source articles but the one 
between A and C is strongest because they hove two citations in common * articles 
J and K* Articles B and C and orticles A and B each cite only one article in common 

The third type of analysis may be described as a two-step process* It involves 
the applicotfon of cluster techniques to a frequency list of cited journals* This type 
of analysis seeks to divide the set of cited journals into subject groupings* The 
underlyii^ assumptions ore thot journals which deal with the some subject areas 
will cite one another* This analysis depends upon the relative frequency of citation 
oF.eqch jpumol in relation to the source journals* The basic measure by which journal 
clusters moy be judged is the similarity between journals in the some cluster* Once 
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clmtars hav« bMn td0nltn#d, journals within o clmftr may bt orrongtd in o 
hierarchy occording to citotion frtqutney* Norin(l972) has applied clmtor 
anolysts tochniquos to {oumafs in tho disctpltnts oF pKysics, chomtstr/ and moltculor 
biology, ond Idtntlfltd {ournol groups In subdisclpiinoiy sub{oot ctroos* 

A fourth typo of anolysts tnvolvos tho comtructton of citation nttwofks and 
may bo doscribtd as o multistop process* Citations aro tokor. from sourct {ournols 
and usod as sourcos to gtntroto moro citations* Th# rolotionship botwoon tho soureo 
ond citod {ournol is significant throughout tho analysts* Ono of tho stmplost ways of 
comidtrlng a citotion nttwork isby q sociomotric d!agrom* ^ In ihU typo of diagram, * 
roiotionships botwoon a numbor of roforoncos, for oxomplo, journbl orticlos, monogroph 
titles, {oumot titles or authors, ore indfcotod by orrt {oining poirits representing 
the references as source and clf«d Itwm; In this ytay, th^ distribution of cltotlom and 
tho froquone/ of citation can bo soon clooriy. 

Tho proiont popor, os oxplialnod obovo. Is eoneornod only with groupings of 
{oumal orticlos. A stmpio citation notwork Is IllustratocI In ftguio 2« 

Tho Intorrolatlonshtp botwoon sovon roforoneo points A, B, C, D,. E, F ond G 
Is tllustratod. Eoch point roprosonis o roforoneo const&tlng of on author and ortlelo 
titlo. Tho citation llnkogos aro shown fay linos {oInIng tho roforoneo points and tho 
diroetlon of tho loiotlonshlp is indicator by on onrow from tho soureo to tho citod point. 
From tho dlogrom. It eon bo soon that ortlelo E Is In somo way o crucial ortlelo boeouso ^ 
It has moto Ineomlng onrown; that Is, It rocolvos tho grootost numbor of citations. It 
roprosonN o groat ovorslmpllflcotlon of tho rod situation, os Is cloorly domomhrolod 
In tho trial notworfcs that oro roportod In Soctlon 3.0. 



1 Soelomohrtc dimrami In tho contoxt of bibllogrophle citation^ o diagram 
Tndleotlng tho Intoriolottonshlps botwoon authors or thoir works., 

2 $00 pogo 1^ footnoto. 



There are two methods of establishing a network of journal articles. 
Firstly, it is possible to start with a relatively old article, such as G, and by 
using a citation index, identify the articles that cite G. This would identify 
articles E one F. The network can be extended by using the citation index again 
to find out the articles that cite E and F; this procedure would identify articles B, 
C and D. These three articles at9 then checked to see if they have been cited; 
this produces articles A and articles B and E previously identified. Article A cites 
both B and C. This method is called a CITED search because it consists of identifyir^ 
articles that have been cited. The method is described in detail in section 2.2.1 . 

The second method of constructing a citation network is to start with a relatively 
recent article and to locate all the relevant citations it contains. These references 
are then used as sources to generate more citations. This method has been called a 
SOURCE search and is described in detail in section 2.2.2. In Figure 2 the 
starting article would be A. Article A makes citations to B and C. Articles B and C 
ore then locoted and the relevant citations they make are noted. Article B cites 
D, E and F, article C makes only one relevant citation,to E. Article D makes only 
one relevant citation, which is also to E. Article E cites article G and article F 
also cites G. The principle behind this method is that a cited reference is treated 
as a source reference, and as such it will contain other relevant references, enabling 
the. network to be extended. 

The procedure fr- bcHi SOURCE and CITED searches is repetitive. The 
iteration is known as •cycling*; once a source or cited item is located, the procedure 
is repeated to provide other citation links from the article. 

The basic process underlying network construction is the selection of suitable 
citation linkages that lead to producing the types of network configurations noted above. 
In making a network of realistic dimensions (in temts of search period, number of 
references, etc.) it is essential to ensute that the references considered relevant ore 
restricted to those which further the concentration of the network. If every citation 
was followed up, many *dead ends* would result. The effort involved in searching all 
these irrelevant references would be wasted and would not help towards concentrating the 
citation linkage at particular points in the network. In proctice, networks rarely 
evolve tewardi such tight concentrations as portrayed in the theoretical cases given. 



and where concentrcifion can be developed this is often due to the application of ^ 
quite rigid cut-oFfs on the data, e.g. choosing only certain words in an article title, 
key authors, etc. Determination of the relevance of articlerfoira network is of 
great importance and is discussed further in section 2.2.4. 

A recently reported study of network construction (Garfield, 1972) on the 
subject of 'explosive welding' illustrates how citation networks can be used in a search 
strategy. Although Garfield successfully identifies the key articles in the field he 
does not state categorically the criteria for choosing citations from source articles 
which would ensure that the proportion of relevant articles was maximised. 

Once a full network has been established, it is relatively easy to see where the 
concentrcition of citations tok^ place and which articles are peripheral, etc., but to 
begin with it is very difficult to decide suitable cut-off points. For instance, during 
the construction of a network there may occur many single citations. It is difficult 
to know whether all of these should be omitted from the final network on the grounds 
that they do not allow the articles to be grouped, or whether recently published articles 
should be kept as a separate group, to be further studied, because they may not have had 
time to be heavily cited. A further problem concerns the relevance of articles in the 
network. In the cose of Figure;2 article C mokes a citation to E, but it may contain, 
say, twelve citations. It is difficult to know beforehand whether iournal article C is 
going to fall in a network that is related to the subject matter at hand. It would be 
very difficult indeed to establish a network by taking at random any journal article. 
Some preliminary restriction in the number of cited items, preferably bounded by a 
subject area, is required. If the search subject is one such as the example given 
later in this paper ('short-term memory') then journal article C cites E, which 
Is relevant, and may also .cite several others not relevant to the network. 

In theory the ideal method would be to chart the entire network of citations, 
with the result that infrequently cited material would not produce concentrations, and 
could ultimately be discarded. If the data base was extremely large, covering for 
example all the social sciences or all the physical sciences, then networks would be 
established in which there were from time to tlmei great concentrations of citations. 
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The concentrations would represent groupings of articles, where the 
interrelationship between articles in the groups was strong. These groups 
may also be weakly related to other groups by infrequently cited articles. A 
good deal of trial and error is necessary in the first place to ensure that the 
citations, as they are traced, converge upon a grouping of articles rather than 
diverge in all directions to produce very weak linkages with an enormous number 
of articles, perhaps across many subject fields* 

2.2 Procedures for retrieval of information using citation data. 

There are three methods of constructing citation networks. These have 
been discussed briefly in section 2.1. Each rhethod is discussed in detail below. 

2.2.1 CITEDsearch 

This method is adopted when a user wishes to identify articles 
published subsequently to his chosen starting reference. The method is iilustrated 
by the flow-chart shown in Figure 3. 

The method has been colled CITED search because it basically consists of 
locating all articles that hove cited the starting reference and then locating articles 
that hove subsequently cited these. A citation index is used for this purpose. 
Figure 4d shows how o citation network is fonned following the procedures outlined 
in the flow-chorh. Each year of the search period is checked to see if the starting 
reference(s) has been cit^d. tn 1967, article A cited the storting reference. 
Article A is then checked to see if it has been cited by any relevant articles within 
the specified search period. It hos been cited by articles B and C, and these ore 
considensd relevant. Articles B and C ore then flagged as LINK references. 
The term 'LINK' reference hcB been used because these references provide further 
citation links if the network is to be extended. Further years of the search period 
tire checked to see if the starting reference has been cited. This may produce article D 
In 1968. Article 6 is checked to see if it has been cited subsequently; in fact, it 
has n»t, : If the citation network is to be extended the articles thgt have been flagged 
OS LINk references can be recorded and then checked to see if they have been 

subseqiMntly cited. The network is complete when all the citations which have been 
Er|c ^'ow**' <^ link references have been checkki to see whether they have been cited in 
'"f"" the specified search period. 
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2.2.2 SOURCE search 

This method is adopted when a user wishes to identify a selection of 
articles published prior to the starting reference. The method is illustrated by the 
flow*chart shown in Figure 3, The method consists of taking citations from a 
storting reference, locating them and then using them as sources to provide citations. 
Figure 4b shows how a citation network is formed following the procedures outlined 
in the flow-chart. The first citotion (A) is taken from the starting reference (S)« 
if the article is relevant it is recorded in the network and the orticle is iocateB> 
Citations from A ore checked for relevance and the relevant ones ore then flogged 
as LINK references, in the diogram articles B and C ore flagged in this manner. The 
remainingjcitations from the storting article ore then analysed in the same way as A, 
Article D is considered relevant and recorded/ but it does not moke any relevant 
citations. At this point the network cannot be extended, for D may be considered 
OS o *dead-end'. To extend the network, the citations that have been flogged as 
LINK references ore treated as sources. . This process Jmoy provide further 
citations flogged as LINK references. The network is complete when the articles 
cited by the LINK references are oil published outside the dotes of the search period. 

2.2.3 DUAL search 

This method uses both SOURCE and CITED methods (hence the name DUAL). Q 
The method is adopted when the user wishes to identify a selection of articles which ore 
both prior and subsequent to the starting reference. For example, the user may select 
o reference published in 1970, but moy wish to obtain articles published between 1968-72. 
If this is the cose, o SOURCE search is adopted from 1968-70. All references 
retrieved in this search ore then used as LINK references for o CITED search for 1970-72. 

2.2.4. Differences in the proceduresfor o CITED and SOURCE search 

The proceduresfor conducting CITED and SOURCE searches ore different. 
For o CITED seorch o citotion index con be used. SCI » cumulated into annual 
volumes, so that each year of the search period hos ta be token in turn. However, a 
five yeor cumulation hos been produced, for 1965-69 and other such cumulations con be 
expected; these gredtly reduce and simplify the search procedure • 
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In a SOURCE search a citation index cannot be used* A SOURCE search 
can only be conductad by locating the actual articles and examining their citations. 
At present there is no publication that lists citations from given source articles (although 
of course SCI tapes record this information)* This search is not conducted by toking 
each year of the search period in turn. Instead all citations with the date of publication 
within the search period are of potential value to the network, and are examined in the 
order in which they were cited* 

2.2.5 Relevance of articles comprising the citation network 

Implicit in the method of retrieving informatbn by the construction of a 
citation network is the assumption that articles contained in it are relevant. The 
articles are fudged relevant by two main criteria: (i) search period, and (ii) subject. 

(i) Search period. This is the number of years the user wishes the data bose 
to be searched* During a CITED search the data bose is searched for each 
year of the search period for citations to the starting reference and subsequently 
for citations to the LINK references^ When a SOURCE search takes place the 
date of each citation that is made from the starting reference and from the 
subsequent LINK references is checked to see if it is within the search periods 
If it is, it is considered relevant to the network; if not, it is discarded^ The 
scatter of citations within a network represents their position at that particular 
time of the network. If the network is extended to cover more years, or 
reduced to cover fewer years, the scatter of citations and the strength of 
relationships will change* 

(ii) Subject. It is impracticable to troce all citattons from all articles, os 
some articles moy cite a large number of items. The network would become 
extremely large and contain many irrelevant references^ Some limiting 
criteria will have to be used if the network generating procedure is to be 
practicable* The criteria used should help concentrate the network linkages 
to a few articles in order to ovoid linkages diverging to many articles that are 
cited once only# As yet there is no clear way of choosing a suitoble criterion 

that works towards this enit an indication of the relevance of orticles may be 
obtained by examining the article title, the author and the joumc* t but this 



11. 



o 

procedure is ntcosscnrtly subjocttve, and obi«cHve erif«rto would b« much 
sottsfocfery. 

2.2.6. !h« storting r«f«rtnce 

A starting r«f«r«ne« hoi to b« s«S«ctttd. Any rtftrene* may b« s«l«ctod so 
long as It malcM eltatlom If o SOURCE SMreh Is to b« conductMl, or Is cltMl If a CITED 
starch Is to b« carried out. How«v«r, the number of Irrelevant references In the network 
Is coralderably reduced, and the number of relevant references Increased If the reference 
selected Is a key and central work In the field. The number of Irrelevant references Is 
also reduced If the starting reference Is a review article. However, within limits It 
does not matter which article Is chosen to begin the search, for If enough LINK references 
are fbrmed,^ a very similar network mcy be constructed from any storting reference. The 
number of Irrelevant references In the citation network Is also reduced If the date of the 
starting reference Is close to the specified year* of the search . This ovelds the extension 
of o citation nttwork outside the seorch period. For example. If ortlcles ore required 
from 1968-72 and the starting reference Is doted 1965, It may be necessary to trace 
citations through articles In 1965-68 until articles In the search period oppeor, although 
drtlcles In 1965-67 will not be required. 

Once 0 network has been constructed the selected starting article does not 
necessarily assume a spectal position In the network. The relationship of the starting 
article to all other articles In the network depends upon the strength of Its citation 
linkages to the other ortlcles. Therefore the fact that one article Is chosen to start 
the search as opposed to another should nof affect the final network. This assumes 
of course that the storttng article Is relevant to the sub{ect of the seorch. 

2.2.7. Prellmfnary conslderatfons of the data requirements for eomtructton of 
citation networks by mochlne 

Any lorge^scale Information service based on the production of citation networks 
requCres o machlnerieodoble data base, since several hundred cltoHors may hove to be onafysed 
before a meonTngful citation netwoik Is produced. The probiemrof comptltng such a dafo 
base may be enormous. Only a few points ore mentioned briefly below. 
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FIntI/, Hm MiccHon oF sourc* journals from which to collect the tfltotlons \s 
erUlcol to ony informotlon service. This problem is full/ dealt with in Working Poper 5. 
The important point is that any bias in the selection of source journals would be reflected 
in the Station networks. In general most citations ore to a relotivel/ small number of journal 
titles, articles and authors; the selection of source journals in a broad subject field may 
produce unforeseen bios. Howevar, in narrow or closel/ knit subject fields citation 
patterns may be more various and source journals need selecting with greater core. 

—All citations from eoeh source journal must be put into the data base. Missing 
. citations may cause gaps in the citation network and cause artificial cut-off points in the 
network. The number of dato fields for eoch citation must be considered. It may prove 
feosible only to collect journul title, year, volume, article page numbers and author. 
Article title may or may not be on important data field, depending upon the type of search 
the user requires. If o user wishes to search by keywords in title, the article title is 
obviously required. However, if o straight reference search is required, articles con be 
uniquely identified by the other five data fields. In fact In some coses it may be desirable 
for searches to be conducted using the five data fields, for citations do not always give the 
titles of ortieles. 

A final point is the number of years the data base should span. If retrospective 
searches are to be conried out the doto base should span a fair number of yean . Three years 
should be considered the minimum period for building up a citation network. A period 
shorter than this Is not adequate,, as only one link con usually be established between the 
source and cited Item, and no further link con be made from these; It may tdce a year or 
more for any article that has been published to be cited. 
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3.0 APPLICATIONS 

InFoimotion strvicas bcMd upon citation doto hovo men/ uppiicotioni/ hy 
virtue of tht flexibility which tho networking technique offers* The following 
oppiicotiom ore boing considered hy DISISS, though not oil of them involvo octud 
networks* 

3«1 Pqckoyes of informotton 

The omount of literoturo in molt fiolds is growing so quickly thot it is very 
difficult for usen to extract and tocoto the infommtion they requiro* The problem is 
eased if information b packaged occording to specified user requirements. A pockogo 
would consist of o I tst of all tho reforences produced by o citation network. 

Such a list could, in theory, be arranged in any manner to meet the requirements 
of the user. Frequency lists of cited authors and articles could be produced; from these, 
the user could see at o glance the most miportant works or authors in the f ield. would 
not hove to handle material peripherd to his needs. If required, the package could 
dso contain the citation network; from this the user could determine the relationship of one 
work with another. The use of citotion doto provMes on ideol method of producing 
pockpges of inforniotion for groups of users, as the information required may easily be 
extracted according to specified criteria from any network. 

Packages may be developed for particular purposes, not all of them concerned 
with actual retrieval: 

o) To study the occurrence and growth of new concepts. 

Packages of infonnatfon could help to identify new approaches and 
concepts developing within the social sciences. Thh may be porticuloHy 
importont for the social sciences, where many theories may exist to explain 
one porticulor phenomenon, where schools of thought may die as quickly as 
they ore bom, where tenninology is unstable, and where there ore no 'hord* 
scientific explanations. The use of traditional indexing services to trace 
new concepts is portlcuiorly difficult in the sociol sciences because indexing 
terms con be only assigned wtien the terms ore fully recognised. 



b) To study the structure of a discipline and the history of kr)awledge« 
A large citation network could describe the structure of a discipline 
(that is if all the citations within a given subject field were traced for 
a stated time period). The citation linkages moy be considered to 
indicate the position of an article in the existing body of knowledge. A 
user may be particularly interested in tracing the evolution of a field; 

a citation network can indicate the importance of each article in the 
development of the field. By tracing citations from articles backwards 
in time (a SOURCE search), the article that first put forward a new 
concept may be determined. Studies of this kind are more difficult 
to cany out with a conventional indexing service. 

c) To locate information in interdisciplinary arecb. The retrieval of 
information using citation data provides cf^fopd method of meeting the 

^ information requirements in interdisciplinary fields. Citation networks 
con be constructed to cut across subject boundaries, provided that the 
data base contains journals from the disciplines involved. 

3.2 Current awareness services 

The Institute for Scientific Information OSI) developed a current awareness 
system ASCA (Automatic Subject Citation Alert) - several years ago. It is bosed on 
SCI. Each week user profiles ore matched with SCI computer entries. If a currertf 
article cites^ony item specified in the user profile, the article is retrieved as being ^ 
relevant to the user. User profiles ore drawn up to include words, phrases, and word 
stems of article titles, plus author and journal titles. 

3.3 Computerised retrieval techniques 

_ ^ The retrieval of infpr.tOtion using citation doto is suitable for computer based 
system. T^ information systems may be on*line, in which case the user may hold a 
'conversation' with the machine in order to retrieve the exact information he requires. 
As thei user has only to provide a storting reference^ the problem is avoided of putting 
user requests in o special form that the mochifie con use* Equally, batch processing con 
be uioci for searches in the some wuy, though it does not atl6w 'conversation' with the machine 
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3.4 Review series 

Citation networks help to indicate the importance of authors and articles in a 
particular field . This type of data may be of particular relevance for the compilotion of 
review series. Citation networks can delimit the subject field, the key authors and 
papers; indicate the total field of which the particular review is a subset; provide fairly 
complete and representative coverage and ensureOurrency i At the moment there is no 
way of ensuring that reviews are adequately covering all the relevant material; the 
review depends very much upon the ability of the reviewer to locate the relevant material 
and then to make value judgements on the importance of each work and its contribution to 
the field. 

3.5 Foreign language translations 

The identification of frequently cited authors and articles may also be particularly 
important for foreign^ language translation services . These services may wisk to translate 
only the most important works in a field. Rather than rely on personal expertise, where an 
element of Sias is unavoidable and personal knowledge is bound to be limited/'citotion 
data provides some evidence of the works most likely to be wanted in translation. 

3.6 Teoching aids 

Packages of infonnation may be produced specifically for teaching purposes and 
as study aids. Groups of articles can be identified that deal with particular aspects of a 
subject. For instance, the groups could deal with: 

a) historical development of a subject 

b) a broad outline of the subject 
e) detailed aspects of the subject. 

Packages may be developed according to detailed specification, so that 
the information given to a student is directly relevant to his needs. Packages developed 
In this manner could be used for programmed learning courses. 
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3.7 Aids to research in informaHon science 

The structure of a discipline as revealed by citation networks may provide 
data on. For example, the shifting of subject boundaries within the social sciences. One 
traditional means of information retrieval is by classification, but as subject boundaries are 
not clear7cut or stable, a means of identifying chcnges is useful. 



]7. 

4.0 PILOT STUDIES 

Several pilot studies were carried out to investigate the techniques and 
procedures for grouping articles using citation data* Science Citation Index (SCI) 
was used for the studies; because it does not contain sources published before 1964, 
it was necessary in some studies to use source journals not included in SCI in order 
to complete the stud/* 

4.1 Studies to identify groups of articles with keywords in the title 

The main objective of these studies was to retrieve articles with chosen keywords 
in the titles and to identify the core articles and the peripheral articles on the topic. 
The keywords chosen were 'short-term memory* and 'deviant behaviour' or synonyms. 
A citation network was constructed for each using citation frequency dcrta. 

4.1.1 Short-term memory - a DUAL search^ 

The objective of this study wck to retrieve articles between 1959 and 1970 with 
the keywords 'short-term memory' or synonyms in the title of the article. The resulting 
network is shown in Figure 5. The starting reference was no. 1 . The 1969 SCI^ was 
used to identify any relevant citations made in 1969 to reference no. 1. The procedure 
was that of a CITED search. Twenty-two citations were identified as relevant and these 
are indicated by nos. 36-57 in Figure 5. The network was extended by using a SOURCE 
search to identify any relevant articles cited by the twenty-two articles. This 
procedure located the articles numbered 2-35. Citations from these orticles were not 
^checked because the number of references compiling the network was becoming too 
unwieldy to deal with by hand. The resulting network is very incomplete, being based 
only on citations made in .1969 to the starting reference. Ideally/ each of the years 
spanned by the network should be checked for citations to the starting reference and the 
citations arising from these should similarly be checked for relevance. 

The resulting citation network was presented for evaluation to a person who knew 
the subject well. He felt that the articles indicated by the citation network as being 
important were important articles in the field. The user was also able to divide the 
references into distinct subject groupings* These ore given below. The subject groups 
dre hot mirtuaiiy exclusive # - 

1v The 1969 edition wds chosen becouse thirWos the earliest yeor of SO in Both 

University library. ^ ^ , ^ : ^ , , 
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(i) paired associate learning* References 3, 6, 7, 8, 21 • 

(li) interpolated activity/ine^tiai interval. References 15, 16, 18, 

25-30, 36, 38-41, 48-50, 53-55. 

(iii) semantic acoustic conFusability. References 10, 19, 24, 43, 52, 56. 

(iv) proactive/retroactive interference. References 1, 2, 4, 5, 9, 
11,J2, 15, 17, 19-23, 31, 33, 44, 45, 47, 55-57. 

The user was not familiar with five of the references, 13, 14, 32, 42 and 51, so 
he could make no value judgements on these. Four references, 34, 35, 37 and 43, were 
thought to be of marginal relevance to any of the four si^ject groups identified. 

Although the articles in the network could be classified into subject groups, 
the citation linkages in the network were not necessarily between articles within the 
same subject group. The starting article concerned proactive/retroactive interference 
and it was to be expected that the number of articles contained in this subject group would 
be greater than in the other groups. To illustrate the exact nature of the citation 
linkages the group on proactive/retroactive interference will be examined in detail. The 
starting article, reference 1, was cited by twenty-two articles in 1969, but only five 
(nos. 44, 45, 47, 56, 57) were considered as being concerned with proactive/retroactive 
interference. Three of the five articles together made seven citations to the articles 
besides reference 1 in the subject group listed below. 

From reference 44 : to reference 34* 

From reference 47 ; to references 5, 19. 

From reference 57 ; to references 4, 5, 11, 26. 

Within the subject group four of the citation links were to two articles only; 
references 4 and 5 were both cited twice* This is on indication of concentration of 
citation links on specific references* If the network had been extended by tracing 
the citation links through all years (1959 - 1969) it is envisaged that the citation 
linb within subject groups would be more evident and further concentrations of citation 
links would occur, as each article would have a greater chance of being cited* 
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The subject group with the next largest number of articles in it is that on 
inWrpolated activity/inertial interval « Citation linkages do occur within this group; 
the/ are listed below* 

From reference 36 : to references 16 and 25. 
From reference 38 : to references 27, 28, 29.. 
From reference 41 : to reference 30. 
From reference 50 : to references 15^ IB, 30 • 

Only one article is cited twice from within the group* This is reference 30, 
cited hy references 41 and 50* 

There are only five and six articles respectively in the remaining two subject 
groups paired associate learning and semantic acoustic confusability* No citation 
linkages occur within either groups and this can be attributed to the small number of 
articles in the groups* 

4*1*2. Deviant behaviour - a SOURCE search 

The main ob|ectsve of the study was tc retrieve articles between 1962 and 1969 with 
the keywords 'deviant behaviour' or synonyms in ^he title of the article* The starting 
reference wos 13 in Figure 6* A SOURCE search was first carried out* The starting 
reference made five citations to other works that were relevant for the search* These are 
references. 1, 6, 9, 10 and 12* Each of Hiese references was checked for any relevant 
citations it rjiight make* The citations made in reference 1 all went outside the years of 
the search period* Reference 9 made five relevant citations* These are references 3, 4, 5 
7 and 8* Each of these references was then checked ^or citations, but none contained any 
relevant citattons within the search period* To increase the number of articles in the 
network the term 'deviant behaviour' was looked up in the Permuterm Subject Index f or 
1969; two articles were identified, references 14 and 15* Reference 14 cited reference 2 
and reinforced citation linkages to references 3 ond 4« Reference 15 made only one 
relevant citotion^, to reference 11* 
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The ^otai network was composed of 15 references only. The small number 
may be due to the fact that the term 'deviant behaviour* is very broad and artic!e titles 
may use more specific tenns, or they may use other tenns for the same subject. The 
search would no doubt have located more references if all the citations from each article 
had been used, instead of only the citations that had the keywords in the title* 

The network was presented for evaluation to an academic researcher working 
In the field of deviant behaviour. He {udged that most of the articles that occurred in 
the network were important in the field. Reference 14 was not known to the user, although 
he thou^t it would be relevant. The reason he had not come up against the reference 
before wd$ that it was in the Journal of Mental Hygiene , a {ournal that sociologists do not 
usually scan. The starting article gave u general tteatment to deviancy and the majority 
of references retrieved also treated the subject in this manner. Only references 8 and 18 
dealt with more specific aspects of deviancy. 

4.2 Studies to identify articles containing new concepts, new ideas and new terms. 

The main objective of these studies was to see how well an information retrieval 
procedure bosed on citation data would perfont) in handling literature that dealt with new 
concepts, new ideas or new terminology in the social sciences. 

Fifteen topics were chosen ot random as tenns that have been introduced into 
social science literature over the past ten years. Each tenn was looked up in ISPs 
Permutenn Subject Index for the years 1969, 1970 and 1971, and the number of articles 
with the term in the article title was recorded. The results are shown in the following list. 
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1969 1970 1971* 



Autistic children 


16 


12 


3 


Autolcinetic 


7 


16 


5 


BioFeedbock 


0 


0 


1 
1 


Bionics 


6 


1 


2 


Cognitive control$($) 


3 


4 




Generation gap($) 


8 


5 


12 

• 


Orienting response 


13 


12 

• 


ft 


Phillips curve (s) 


0 


2 


0 


PreDoratorv resDOMefs) 


1 




A 
U 


Representational processes 


0 


0 


0 


Skin resistance(s) 


21 


12 


9 


S/nectics 


2 


1 


2 


T«groups 


29 


1 


4 


Transcendental meditation 


0 


2 


0 


Yoga 


1 


3 


0 



* Data available only for January to September 1971. 

The list indicates in a very preliminary way the frequency of use of the terms In 
the literature. Terms that occur infrequently in all 3 years can be interpreted as being 
of fairly recent origin as showing that only a few people were publishing work in that 
particular Held. Tenns occurrii^ frequently in 1969 and declining in 1970 and 1971 
may be interpreted as being of declining importance in the literature.. An example of 
this appears to be t«^roups. However as the data was only available over three years 
(only these three volumes cf the Permuterw Subject Index w ere available in Bath University 
Library) firm conclusions cannot be.^eached at this stage. If data were available over a 
period of ten years the conclusions would have more validih/. A further limitation of the 
present study is thot it is based only on iSI's files; subject coverage of some of the terms 
in the above list may not be comprelvervive. 

the procedure for identi^ing the articles in the studies was as follows. The 
orticles occunrir>g in the 1971 Permutenn Subject Index were located in the Source Index 
to obtain full bibliographical details and to obtain the number of citotiora each reference 
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madt. It ws decided to work from the most recent doto available, rKRnely 1971 . Those 
that did not moke any citations vere discarded from the stud/ for the/ could not provide 
links to other articles. The references with the greatest number of citations wei« located 
first. Ideall/ a total citation network should be constructed b/ tracing all citations from 
every article. This Is a large task for a manual operatiorv the procedure adopted was to 
Identify first the citations to the oldest material, the assumption being that these articles 
were the ones most llkel/ to lead to the articles that fint put forward the new concepts. 
The basic procedure used was that of a SOURCE search described in section 2.2.2. It was 
modified by selecting the earliest citations from the articles Instead of tracing citations 
from all articles. To validate the choice of citations used In the network the text of each 
article was read, and this In some coses Indicated the earliest relevant citation. As the 
terms were troeed bock an indefinite number of yean, the studies Invariably went outside 
the scope of ISI*s data base. Each article was obtained and the citations from it were 
scanned. If the publication was not available in BoJ^OniversIty Llbraiy , a request was 
mode to the National Lending Library fw Science and Technology. Sometimes a 
publication was unobtainable because ft was very old and Insufficient bibliographical 
details were available to enable It to be located. When this occurred the network was 
artificially cut off and work could not proceed further unless other publications In the 
network led to the publication that could not be located. 

The results of tracing five terms are given below; 

o) T»groupt 

Reference 1 1n Figure 7 was chosen as the starting point In the citation 
network because It was a review article and mode 75 citations. Analysis 
of the citations by title reveoled the ones that would be most likely to 
give leads to earlier works on the T-group concept. When these 
publications were obtained It was found that only two were directly 
relevant. These references 2 and 8 both describe the historical 
development of the concept. References 5 and 6 are the earliest 
publications dealing specifically with draining groups*. Reference 7 
Indicates the historical development of the concept. Reference 3 Is 
an Important publication In the field for It describes the first experiments 



Figure 7. 
T-GROUPS CITATION NETWORK 
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with T-groupt ond hence cited the reports from the loborotory. ' The 
authors most frequently cited opptar to be L. P. Bradford, K. D. 8enne, and 
R. Lippitt; these men could be considered the irmovaton in the T-group field. 

Short-term memory 

Further work was done on the short-temt memory citation network 
(Figure s) to see if the first oppeoranee of the term could be detected. 
Using the SCI no other article earlier than Peterson & Peterson, reference 1, 
could be found with the keywords in the title. 

Phillips curve 

Only two references were given in the Penwutenn Subject liridex. Neither 
of these made any citations, and so the study could proceed no further for 
this term. 

Synectics 

The two references made in 1972 were both hf the some author and both 
articles made almost the somedtations. The citations used in the network 
were common to both articles. Refirences 1 and 2 cited several papers 
by S.J.J. Gordon (6) published by Arthur D. Little, Inc., but no dotes 
of publication were given. Arthur D. Little come into operation in 1952 
so the papers must. post-date this. From the text of reference 4 it may be 
assumed that those popers were published in 1952 and deal with the results 
of the research started in 1944. The first appearance of the term *syn*cties* 
is in the publication of S.J.J. Gordon, 1961, reference 4. 

Genewtton y op 

Only a few refertnces located in the Pemiutenn Subject Index mode cito^tons. 
Eight, five and twelve references were located respectively in the 1969, 1970 
and 1971 Penwutenn Subject l ^ idex , but only references 1, 2 ond 4 mode any 
citations. The reference making the most citations, and dso citations to the 
oldMt material, was Murray 1971 . This article was chosen as the starting 

point; of its 33 citations three were identified as being closely linked. 
These ore shown in Figure 9. 
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SYNECTICS CITATION NETv^wRK 




1971 
1969 
1961 
1958 
1952 
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GENERATION GAP CITATION NETWORK 
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References 3 and 4 are both cited by references 1 and 2. From the 
citation pattern, it appears that the term 'Generation gap' is associated 
with early studies of adolescence. 

This study was investigated further in order to establish how closely the teiro 
'Generation gap' is associated with early studies of adolescence. 

Two approaches were taken:- 

0 tracing citations to A^od, M. 1928 in the 1969, 1970 and 1971 
SCI, and 

ti) tracing the citations from the two articles identified in the 1971 
Permuterm Subject Index. 

The resulting networks ore shown in Figures 9a and 9b. 

From Figure 9a, it con be seen that only 2 authors cited Mead 1928; 
Mead, F. mode a citation In 1968 but it was entered in the 1969 SCI. 
The third author, J. Ablon (1970), cited the revised 1961 edition of the 
1928 work. The network shown in Figure 9b was constructed usir>g the 
SOURCE search method described in section 2.2.2. The 33 references 
cited by Murray (reference 1) and the four references cited by Thomas 
(reference 2) were obtained. The citations that these articles mode were 
checked against each other; only the ones in common ore shown in Figure 9b. 
It was too large a task to construct by hand a network containing all the 
citations from reference 1, os some of these articles contained over 70 
citations. A good proportion but not all of the articles cited by Murray 
could be obtained by the time tfiisjaper was written. An interesting 
feature of the citation pattern is that in 1967 articles cited other articles 
published in the some year. These ore references 7, 9, 10 and 11. 
All these articles occur in fact in the some issue of the same journal. The 
authors must have been in close contoct with each other's work. 
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it wes hoped that a comparison of the networks shown in Figures 9a and 

9b might show some articles in common, but this was not so. The 

articles in the networic shown in Figure 9b deal mainly with 'student protests*. 

4.3 Pilot Studies; Problems and Conclusions 

The use of citation data to identify groups of articles proved to be successful . 
The study on short-term memory (section 4. 1 . 1 .) showed four subject groupings of 
articles with citation linkages present within two of the groups. There was evidence of 
the concentrating of the citation cfr<kages at particular references within the groups. If 
the study hod been extended and all citations hod been traced through all years of the 
network rather than 1969, citation linkages within all the groups would have increased. 

The study using the keyword 'deviant behaviour* (section 4.1.2) did not 
show such clear groupings of articles. If the network had been constructed without the 
restriction of the keyword, groups of articles would no doubt hove appeared. T/w use 
of keywords in this study was too limited, the keyword chosen was too brood for groups 
of articles to be identified by title, (iowever, these studies were purposely conducted 
using keywords in titles, because it was felt that they would reduce the number of 
references retrieved and would therefore be more suitable for manual h'.ndling. If 
all the citations from each orticle hod been used in the study, the network would \wm 
become too large. 

The studies to identify articles containing new concepts (section 4.2) showed 
that although early articles may be identified, there is no firm evidence to suggest 
exactly which article first put forward the new concept. This is due to the fact that 
when constructing the networks it was difficult to define the relevant articles; in the 
study it was only the earliest citation mode by each article which was considered 
relevant. 

Previous studies on tracing the history of subjects by citation Ullages hove not 
put forward ory objective measures for establishing the relevance of articles. Garfield 
has reported several studies on the use of SCI t o trace the history of various subjects. 



26. 



However hh studies wem primoril/ aimtd at identifying the importont articles in the 
development of the sgb{ect rather than identifying articles that first pgt forward a 
new concept* It may be that some articles that first put forward a new concept ore 
considered important articles in that field but it may not be necessarily true for all of 
them* This may account for the foct that no firm conclusions could be reached in the 
pilot studies* It may be illuminating at thb point to examine the methods which Garfield 
used to trace the history of particulor subjects* 

In 1964 Garfield, Sher and Torpie identified the key DNA discoveries as 
described in Isaac Asimov^ The Genetic Code, and then carried out a literature search 
using conventional bibliographic tools in order to identify the articles that corresponded 
to the historical events as described by Asimov*_ The citation linkages between each article 
were examined to see how for the linkages represented the historical events* The study 
concluded that citation patterns were a valid means of investigating historical events* 
In 1969 Garfield reported a study on the recent history of ONA* From a given list of 
30 to 40 articles published in 1967 on the subject of DNA he compiled a master list of 
all citations in their articles* He disre^^orded all articles in this list cited less than 
five times* He then checked the list of articles in SCI to make sure thot they were all 
highly cited articles* He then repeoted the process for all years of the study 1961 to 1967* 
In this manner he claimed to hove located all the most important articles on the recent 
history of ONA* However, Garfield does not explain how he chose the thirty or forty 
articles for each year of the study* The choice of his starting articles is critical to his 
method, and although all unimportant articles ore eliminated by checking in the SCI, some 
important articles may not have been cited by the starting articles* The method does not 
provide a citation network as only one citation link is used to provide further articles in 
each year of the seorch* The method is thcit described as a one step citatk)n analysts in 
section 2* 1 * The choice of the 30 or 40 starting articles is therefore critical in this cose 
because any bio^^ in the solection of the storting articles cannot be balanced out by cycl ing 
through all years of the seorch perk)d* A cycling procedure would provide further citation 
linkages which would reoffirm the importance of an article previously identified* 

More recently, Garfield (1970) conducted a study on the htstory of the design of 
electromagnetic flowmeters in order to klentify 'edifying* articles* The method used was to 
construct a cltotion network* A known article was selected and the citations it mode examined* 
The relevance of each article was detemilned by examining authors, titles, citations, 
ERJC ond frequency of citotion as indicated in SCI* 
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The Pennuterm Subfect Index wcb also checked For relevant articles and altogether 
about 500 articles were produced of which 159 were relevant. The Final network 
contained twenty-six orticles. These were identified hy using SC{; the greoter the 
number of citations fo an earlier vfctk, the greater was the likelihood that th^ cited 
pofMr was a key event in the subject field. In this stud/ Garfield used a sitotion 
network to identify the key articles but he gives little detaii of the precise procedure 
he followed and of the amount of cycling he hod to d^. 

The outstanding problem revealed by the pilot studies is the successful 
identification of relevant citations that enable the- networks to be c nstructH in the 
shortest possible time; ideally the orticles identified must lead to a concentration of 
the network, rather thon to a scotter of references that hove only been cited once. 
Garfield (1970), who has done perhaps most work on use of citatten networks, does 
not appear to put forward any definitive method of selectii^ the appropriote articles. 
He suggests the following criteria: author, title, citations mode by the article, and 
finally the number of times the article has been cited in SCI. 

These criteria are not ve«y satisfactory for the following reasons. An author 
can be used as a criterion only if he is known to the user as having worked in the field. 
Key authors may be easy to recognise; but problems arise with the large proportion of 
lesser known authors. The title of an article may certainly tw used as an indication of 
relevance but serious misjudgements may occur. The citatei>vns an article mokes may 
give some indication of relevance, but the article must first be locked for the 
citations to be scanned, and this may be a time consuming task. Finally, Garfield 
suggesl».«hecking in SCI ^ thb would give an iriv,,cation of the importance of the work. 
But if all or some of these procedures have to be carried out for every citation mode 
by the storting article (some articles may mdce up to 60 citotions or even more) and 
then all the preceding articles, the retrieval process must be extremely slow. ■ 

Further problenfis in constructing networb concern tiie size and structure cf 
the data base from which the network is compiled. For an effective infonnation service, 
the citation data bose needs to be very large If a subject or disci.-)line is to be well 
covered by the service. Not only that, but it must be composed of a valid selection 
of source fournals. If the service is to be economically viable it may be necessary 
to make a selectton of fournals. An integral part of OlSiSS research b to investigate 
l!»aMicotlonjoLcl^^^^ 
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ore being used to identify groups of frequently cited fournals. Journals selected in 
this manner would provide the basis of an information store for a service. Retrieval of 
inFormation from such a service could be carried out by using the methods described in 
this paper, based on citation data. 



Other problems that are of relevance to citation networks concern the 
nature of citation practice. Several papers have been concerned with these problems; 
-see Garfield and Sher (1963) East and Weymdn (1969), Price (1965). The main 
areas for concern are mentioned briefly below. 

The major assumption about citation practice is that the contents of the 
article will determine which references an article will cite. However, an author may 
cite a work for other reasons. Some material may be cited because it is so widely 
established that to ignore it would be an omission, whether it was used or not. An 
author may cite material that he has not read, perhaps in order to give more weight 
to his own work. An item may be cited because it supports a particular point of 
marginal relevance to the main theme of the pap^r. An author may wish to make his 
own previous work more widely known, and he may therefore cite it without its being 
of strict relevance to the present work. Finally, there is no way to ensure that the 
author was aware of all the relevant articles when he wrote up his work, nor that he 
cited all works that went towards the writing of his own article. Although citations 
are by no means a perfect indicator of use, deficiencies diminish in importance as a 
network grows in size, and individual inadequacies tend to cancel out. In a field 
such as social work, where citations are less frequent, and where less effort may have been 
made by an author to search previous literature than in a 'pure* research field, the 
problems may be much greater. 
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5.0 FUTURE WORK 

DISISS will continue to investigate the use of citation data for the 
identification of groups of articles. It is hoped that it may prove feasible to 
develop information services based on citation data. I n particular DISISS wi 1 1 
look at the use of citation data to construct paclcap(>s of information, also its use 
in the selection of foreign language items for translation. The problems that have 
been discussed in this paper will be further investigated. Special attention will be 
directed towards the establishment of suitable criteria for the determination of the 
relevance of articles for a citation network. It will not be possible to generate 
effective citation networks from present DISISS citation data files due to the fact that 
data for citation networks must be derived from all source journals for each year the 
network is to span. 

At present the DISISS citation data files are made up as follows. Data 
collected in Ihe citation pilot study is composed of citations from source journals 
published in 1950, 1960 and 1970. All citations were collected from every third 
article. The data collection for all years is not complete. This dato is not adequate 
for the construction of networks, because the gaps in the citation collection would cause 
artificial cut-off points in the networks. Also, it was collected over a range of social 
science disciplines, so that data for any one discipline is thin. Data for the main citation 
shKly was collected for 1969-1970 only; this data by itself is again not suitable for 
building networks, since the data must be available over a number of years. DISISS 
will therefore investigate the posscbility of obtaining data from other sources, perhaps 
-ISI, for construction of citation networks. 
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